Nuances of astronomical data

  1. Observational rather than from designed experiments.

  2. Calibration is crucial for connecting observations to physics.

  3. Sparsity is inevitable.

  4. Objects are mixed:

    1. Different life-stages and time-scale of evolution.
    2. Time-scales are much longer than we can observe.
  5. Measurement error is heteroscedastic.

‘Models’ and ‘Models’

Astrophysics

A parsimonious mathematical representation of the expected signal from a physical process that generates an emission detectable by instruments.

Statistics

A stochastic representation of the data-generating process that accounts for the discrepancy between the astrophysical model and the data.

Example: Photon counts

  • \(\mathrm{Poisson}(g(\theta))\) as a statistical model for photon counts.

  • Astrophysical model specifies \(g(\theta)\) as a power-law relation.

Uncertainty quantification is at the heart of statistical models.

Example: Time delay estimation

  • Model misspecification can cause spurious results.

  • Different model fits on the same data can reveal completely different possibilities.

  • Don’t blindly make inference on the highest mode of the posterior distribution, or smallest loss function in ML.

General advice

  • The better the statistical and astronomical data reflect the data the better quality of what the data reveal to us.

  • Carefully consider implicit modelling and statistical assumptions; these will affect scientific findings.

…all models are wrong but some are useful.

(Box & Draper, 1987)

Six Maxims

  1. All data have stories, but some are mistold.

  2. All assumptions are meant to be helpful, but some can be harmful.

  3. All prior distributions are informative, even those that are uniform.

  4. All models are subject to interpretation, but some are less contrived.

  5. All statistical tests have thresholds, but some are mis-set.

  6. All model checks consider variations of the data, but some variants are more relevant than others.

All data have stories, but some are mistold.

Sampling Effects

  • Measurements of the properties of individual objects have become more accurate.

  • Better accuracy is not the same as a more representative sample of objects.

Selection Effects

  • Data is often collected for a specific purpose.

  • These are not uniformly random selection of data!

  • Characteristics of individual surveys can affect overall interpretation when combined together.

    • e.g., training ML models on convenience samples can lead to biased results.

Others data stories

  • Preprocessing can introduce statistical and systematic errors.

  • Calibration measurements are imperfect since these themselves include measurement errors and systematics.

All assumptions are meant to be helpful, but some can be harmful.

Non-Gaussianity

Be wary of:

  • Outlying observations
  • Low Poisson counts
  • Background subtraction
  • Error propagation
  • Binned data
  • Heavy-tailed and asymmetric distributions.

Heteroscedasticity

  • Homoscedasticity is the assumption that errors all come from a distribution with a fixed variance.

  • Astronomical data often quotes ‘one-sigma’ measurement-error uncertainties that are heteroscedastic.

    • Q: What is the distribution of these errors? Gaussian?

Checking assumptions

  • Residuals analysis.
  • Sensitivity of fits to starting values.

Check the fit in light of domain science knowledge, instead of blindly proceeding with the highest mode or other computed summary as the best model fit.

Example: \(\chi^2\) statistic

  • \(\chi^2\) minimisation assumes Gaussian approximation of measurement errors.

  • Using a Gaussian approximation of Poisson count data is not valid when the variance is very different from the average count.

    • evidence of ‘overdispersion’ or ‘zero-inflation’.
  • Try fitting counts directly and use a different fit statistic.

All prior distributions are informative, even those that are uniform.

Uniform priors everywhere!

Interpretation of credible intervals hinges on the interpretation of the prior distribution.

\[\log(X) \sim \mathrm{Unif}[a,b]\]

is not non-informative on the original scale of \(X\).

Bounded priors

  • Bounded uniform priors must be used with care because these completely exclude portions of the parameter space.

  • Set uniform bounds wide enough to not influence the likelihood.

All models are subject to interpretation, but some are less contrived.

Physics First

best to start with the physics and then consider whether the empirical findings make sense in terms of the physics and/or how we can make sense of them.

All statistical tests have thresholds, but some are mis-set.

p-values

  • In frequentist statistics, the p-value is a measure of the inconsistency between the observed data and an assumed distribution under some null hypothesis.

  • Astronomers typically use more conservative thresholds than in biomedical research, i.e., \(\alpha = 0.05\) vs \(\alpha = 0.0027\).

Family-wise Error Rate (FWER)

Probability of committing at least one Type I Error (false-positive) among \(m\) hypothesis tests:

\[\mathrm{FWER} \le 1 − (1 − \alpha_\mathrm{indvidual})^m\]

For 20 observations, each at 3\(\sigma\):

1 - (1 - 0.0027)^20
[1] 0.05263708
  • Corrections are often too conservative for astronomical applications.

  • Consider controlling the False Discovery Rate (FDR).

All model checks consider variations of the data, but some variants are more relevant than others.

Quantifying uncertainty

  • Replication is a way to estimate variability and uncertainty.

  • But how these replicates are generated will influence the estimated uncertainties, e.g., same observational setup, different instruments, etc.

  • Conditional tests provide more statistical power.

Authors’ Conclusions

  • There is no silver bullet technique that can be applied without considering limitations and assumptions.

  • Researchers encouraged to check against these six maxims to improve their data analytic practice.

Remarks

  • Very general advice, applicable to other domains beyond astronomy.

  • Great to acknowledge the data collection biases which are not present elsewhere.

  • Could be more forceful with the benefits, e.g., stronger claims, fewer false conclusions.

  • Astronomy community needs to include discussion about sound astronomical data analyses as well substantive domain questions.